A Linear Size Index for Approximate Pattern Matching
نویسندگان
چکیده
This paper revisits the problem of indexing a text S[1..n] to support searching substrings in S that match a given pattern P [1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(m) or requires Ω(n) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(n log n)-space index that can support k-error matching in O(m+occ+log n log logn) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linear-size index that still has a time complexity linear in m. In particular, we give an O(n)-space index that supports k-error matching in O(m+ occ+ (logn) log logn) worst-case time. Furthermore, the index can be compressed from O(n) words into O(n) bits with a slight increase in the time complexity.
منابع مشابه
Parameterized matching on non-linear structures
The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set Σ. In the parameterized pattern matching model, a consistent renaming of symbols from Σ is allowed in a match. The parameterized matching paradigm has proven useful in problems in software engineering, computer vision, and other applications. In clas...
متن کاملIndexes for Jumbled Pattern Matching in Strings, Trees and Graphs
We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with which we can find a match in time proportional to the size of the match. We also show how we need only linear space if we are content with approximate matches.
متن کاملAn Index for Two Dimensional String Matching Allowing Rotations
We present an index to search a two-dimensional pattern of size m × m in a two-dimensional text of size n × n, even when the pattern appears rotated in the text. The index is based on (path compressed) tries. By using O(n) (i.e. linear) space the index can search the pattern in O((logσ n) ) time on average, where σ is the alphabet size. We also consider various schemes for approximate matching,...
متن کاملFAMOUS: Fast Approximate string Matching using OptimUm search Schemes
Finding approximate occurrences of a pattern in a text using a full-text index is a central problem in bioinformatics and has been extensively researched. The introduction of practical bidirectional indices has opened new possibilities for solving the problem as they allow the search to be started from anywhere within the pattern and extended in both directions. In particular, use of search sch...
متن کاملA Hybrid Indexing Method for Approximate String Matching
We present a new indexing method for the approximate string matching problem. The method is based on a suffix array combined with a partitioning of the pattern. We analyze the resulting algorithm and show that the average retrieval time is , for some that depends on the error fraction tolerated and the alphabet size . It is shown that for approximately , where . The space required is four times...
متن کامل